摘要 :
The k Nearest Neighbors (KNN) algorithm has been widely applied in various supervised learning tasks due to its simplicity and effectiveness. However, the quality of KNN decision making is directly affected by the quality of the n...
展开
The k Nearest Neighbors (KNN) algorithm has been widely applied in various supervised learning tasks due to its simplicity and effectiveness. However, the quality of KNN decision making is directly affected by the quality of the neighborhoods in the modeling space. Efforts have been made to map data to a better feature space either implicitly with kernel functions, or explicitly through learning linear or nonlinear transformations. However, all these methods use pre-determined distance or similarity functions, which may limit their learning capacity. In this paper, we propose a novel deep learning architecture, which is called the Deep Similarity-Enhanced K Nearest Neighbors (DSE-KNN), to learn an optimized similarity function of the data directly towards the goal of optimizing the KNN decision making. In other words, the type of similarity function that is used in our method is not pre-determined but rather learned to map data to a high-dimensional feature space where the accuracy of the KNN decision making is maximized. Experimental results show that DSE-KNN outperforms other common machine learning methods on classifying different types of disease datasets and predicting daily price direction of different stock ETFs.
收起
摘要 :
In this paper, a new classification method for enhancing the performance of K-Nearest Neighbor is proposed which uses robust neighbors in training data. The robust neighbors are detected using a validation process. This method is ...
展开
In this paper, a new classification method for enhancing the performance of K-Nearest Neighbor is proposed which uses robust neighbors in training data. The robust neighbors are detected using a validation process. This method is more robust than traditional equivalent methods. This new classification method is called Modified K-Nearest Neighbor. Inspired the traditional KNN algorithm, the main idea is classifying the test samples according to their neighbor tags. This method is a kind of weighted KNN so that these weights are determined using a different procedure. The procedure computes the fraction of the same labeled neighbors to the total number of neighbors. The proposed method is evaluated on a variety of several standard UCI data sets. Experiments show the excellent improvement in accuracy in comparison with KNN method.
收起
摘要 :
In general, the codebook generation for vector quantization emphasizes two major topics: minimizing the distortion error to improve the quality of the reconstructed image and reducing the time cost to enhance the efficiency. LBG i...
展开
In general, the codebook generation for vector quantization emphasizes two major topics: minimizing the distortion error to improve the quality of the reconstructed image and reducing the time cost to enhance the efficiency. LBG is one of the famous codebook generation techniques proposed in recent decades. LBG was widely utilized due to its simplicity. However, it only guarantees a local optimum. To be an alternative to LBG, the pair-wise nearest neighbor (PNN) algorithm was devised to obtain better results. By the fully searching operation, PNN needs a large amount of calculations. In this paper, a Two-Phase Codebook Generation (TPCG) technique based on the relative pixel magnitudes is presented. With the pre-process of image decomposition, TPCG applies a simple quantifier to divide low frequency blocks in linear time, and then employs a new proposed k-nearest neighbor graph construction approach with Double Linked Algorithm instead of PNN for high frequency blocks. The experiments reveal that TPCG has accuracy approximates to that of PNN while keeping a low time cost.
收起
摘要 :
In general, the codebook generation for vector quantization emphasizes two major topics: minimizing the distortion error to improve the quality of the reconstructed image and reducing the time cost to enhance the efficiency. LBG i...
展开
In general, the codebook generation for vector quantization emphasizes two major topics: minimizing the distortion error to improve the quality of the reconstructed image and reducing the time cost to enhance the efficiency. LBG is one of the famous codebook generation techniques proposed in recent decades. LBG was widely utilized due to its simplicity. However, it only guarantees a local optimum. To be an alternative to LBG, the pair-wise nearest neighbor (PNN) algorithm was devised to obtain better results. By the fully searching operation, PNN needs a large amount of calculations. In this paper, a Two-Phase Codebook Generation (TPCG) technique based on the relative pixel magnitudes is presented. With the pre-process of image decomposition, TPCG applies a simple quantifier to divide low frequency blocks in linear time, and then employs a new proposed k-nearest neighbor graph construction approach with Double Linked Algorithm instead of PNN for high frequency blocks. The experiments reveal that TPCG has accuracy approximates to that of PNN while keeping a low time cost.
收起
摘要 :
In general, the codebook generation for vector quantization emphasizes two major topics: minimizing the distortion error to improve the quality of the reconstructed image and reduc-ing the time cost to enhance the efficiency. LBG ...
展开
In general, the codebook generation for vector quantization emphasizes two major topics: minimizing the distortion error to improve the quality of the reconstructed image and reduc-ing the time cost to enhance the efficiency. LBG is one of the famous codebook generation techniques proposed in recent decades. LBG was widely utilized due to its simplicity. How-ever, it only guarantees a local optimum. To be an alternative to LBG, the pair-wise nearest neighbor (PNN) algorithm was devised to obtain better results. By the fully searching op-eration, PNN needs a large amount of calculations. In this paper, a Two-Phase Codebook Generation (TPCG) technique based on the relative pixel magnitudes is presented. With the pre-process of image decomposition, TPCG applies a simple quantifier to divide low frequency blocks in linear time, and then employs a new proposed k-nearest neighbor graph con-struction approach with Double Linked Algorithm instead of PNN for high frequency blocks. The experiments reveal that TPCG has accuracy approximates to that of PNN while keep-ing a low time cost.
收起
摘要 :
Reverse k-Nearest Neighbor (RkNN) Queries have got considerable attentions over the recent years. Most state of the art methods use the two-step (filter-refinement) RANN processing. However, for a large k, the amount of calculatio...
展开
Reverse k-Nearest Neighbor (RkNN) Queries have got considerable attentions over the recent years. Most state of the art methods use the two-step (filter-refinement) RANN processing. However, for a large k, the amount of calculation becomes very heavy, especially in the filter step. This is not acceptable for most mobile devices. A new filter strategy called BRC is proposed to deal with the filter step for R&NN queries. There are two pruning heuristics in BRC. The experiments show that the processing time of BRC is still acceptable for most mobile devices when k is large. And we extend the BRC to the continuous R&NN queries.
收起
摘要 :
Reverse k-Nearest Neighbor (RkNN) Queries have got considerable attentions over the recent years. Most state of the art methods use the two-step (filter-refinement) RkNN processing. However, for a large k, the amount of calculatio...
展开
Reverse k-Nearest Neighbor (RkNN) Queries have got considerable attentions over the recent years. Most state of the art methods use the two-step (filter-refinement) RkNN processing. However, for a large k, the amount of calculation becomes very heavy, especially in the filter step. This is not acceptable for most mobile devices. A new filter strategy called BRC is proposed to deal with the filter step for RkNN queries. There are two pruning heuristics in BRC. The experiments show that the processing time of BRC is still acceptable for most mobile devices when k is large. And we extend the BRC to the continuous RkNN queries.
收起
摘要 :
In this paper, a new classification method for enhancing the performance of K-Nearest Neighbor is proposed which uses robust neighbors in training data. This new classification method is called Modified K-Nearest Neighbor, MKNN. I...
展开
In this paper, a new classification method for enhancing the performance of K-Nearest Neighbor is proposed which uses robust neighbors in training data. This new classification method is called Modified K-Nearest Neighbor, MKNN. Inspired the traditional KNN algorithm, the main idea is classifying the test samples according to their neighbor tags. This method is a kind of weighted KNN so that these weights are determined using a different procedure. The procedure computes the fraction of the same labeled neighbors to the total number of neighbors. The proposed method is evaluated on five different data sets. Experiments show the excellent improvement in accuracy in comparison with KNN method.
收起
摘要 :
The outlier detection algorithm based on reverse k-nearest neighbors can detect isolated points. The time complexity of finding the k-nearest neighbor is O(kN~2), which is not suitable for large data set, and the selection of the ...
展开
The outlier detection algorithm based on reverse k-nearest neighbors can detect isolated points. The time complexity of finding the k-nearest neighbor is O(kN~2), which is not suitable for large data set, and the selection of the parameters k have a great impact on getting the outliers in large data set. This paper used an adaptive method to determine the parameters k, and proposed an efficient pruning method by the triangle inequality, which reduced the computation in detecting outliers. The theoretical analysis and experimental results demonstrated the feasibility and efficiency of the algorithm.
收起
摘要 :
The outlier detection algorithm based on reverse k-nearest neighbors can detect isolated points. The time complexity of finding the k-nearest neighbor is O(kN~2), which is not suitable for large data set, and the selection of the ...
展开
The outlier detection algorithm based on reverse k-nearest neighbors can detect isolated points. The time complexity of finding the k-nearest neighbor is O(kN~2), which is not suitable for large data set, and the selection of the parameters k have a great impact on getting the outliers in large data set. This paper used an adaptive method to determine the parameters k, and proposed an efficient pruning method by the triangle inequality, which reduced the computation in detecting outliers. The theoretical analysis and experimental results demonstrated the feasibility and efficiency of the algorithm.
收起